Unsupervised Multilingual Grammar Induction

نویسندگان

  • Benjamin Snyder
  • Tahira Naseem
  • Regina Barzilay
چکیده

We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure. 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction

We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM ...

متن کامل

Multilingual Dependency Parsing: Using Machine Translated Texts instead of Parallel Corpora

This paper revisits the projection-based approach to dependency grammar induction task. Traditional cross-lingual dependency induction tasks one way or the other, depend on the existence of bitexts or target language tools such as part-of-speech (POS) taggers to obtain reasonable parsing accuracy. In this paper, we transfer dependency parsers using only approximate resources, i.e., machine tran...

متن کامل

Unsupervised Analysis of Structured Human Artifacts

Unsupervised Analysis of Structured Human Artifacts by Taylor Berg-Kirkpatrick Doctor of Philosophy in Computer Science University of California, Berkeley Professor Dan Klein, Chair The presence of hidden structure in human data—including natural language but also sources like music, historical documents, and other complex artifacts—makes this data extremely difficult to analyze. In this thesis...

متن کامل

Phylogenetic Grammar Induction

We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articu...

متن کامل

Learning Common Grammar from Multilingual Corpus

We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that i...

متن کامل

The Shared Logistic Normal Distribution for Grammar Induction

We present a shared logistic normal distribution as a Bayesian prior over probabilistic grammar weights. This approach generalizes the similar use of logistic normal distributions [3], enabling soft parameter tying during inference across different multinomials comprising the probabilistic grammar. We show that this model outperforms previous approaches on an unsupervised dependency grammar ind...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009